28 research outputs found

    Using Ontologies for the Design of Data Warehouses

    Get PDF
    Obtaining an implementation of a data warehouse is a complex task that forces designers to acquire wide knowledge of the domain, thus requiring a high level of expertise and becoming it a prone-to-fail task. Based on our experience, we have detected a set of situations we have faced up with in real-world projects in which we believe that the use of ontologies will improve several aspects of the design of data warehouses. The aim of this article is to describe several shortcomings of current data warehouse design approaches and discuss the benefit of using ontologies to overcome them. This work is a starting point for discussing the convenience of using ontologies in data warehouse design.Comment: 15 pages, 2 figure

    Modelling ETL processes of data warehouses with UML activity diagrams

    Get PDF
    Extraction-transformation-loading (ETL) processes play an important role in a data warehouse (DW) architecture because they are responsible of integrating data from heterogeneous data sources into the DW repository. Importantly, most of the budget of a DW project is spent on designing these processes since they are not taken into account in the early phases of the project but once the repository is deployed. In order to overcome this situation, we propose using the unified modelling language (UML) to conceptually model the sequence of activities involved in ETL processes from the beginning of the project by using activity diagrams (ADs). Our approach provides designers with easy-to-use modelling elements to capture the dynamic aspects of ETL processes.Extraction-transformation-loading (ETL) processes play an important role in a data warehouse (DW) architecture because they are responsible of integrating data from heterogeneous data sources into the DW repository. Importantly, most of the budget of a DW project is spent on designing these processes since they are not taken into account in the early phases of the project but once the repository is deployed. In order to overcome this situation, we propose using the unified modelling language (UML) to conceptually model the sequence of activities involved in ETL processes from the beginning of the project by using activity diagrams (ADs). Our approach provides designers with easy-to-use modelling elements to capture the dynamic aspects of ETL processes

    Definición y validación de medidas para procesos ETL en almacenes de datos

    Get PDF
    In data warehousing, ETL (Extract, Transform, and Load) processes are in charge of extracting the data from data sources that will be contained in the data warehouse. Due to their relevance, the quality of these processes should be formally assessed from early stages of development, in order to avoid making bad decisions as a result of incorrect data. In this paper, a set of measures is presented to evalu- ate the structural complexity of ETL process models at conceptual level. Moreover, this study is accompanied by one controlled experiment whose aim is the empirical validation of the proposed measures. The use of these measures can aid designers to predict the e®ort associated with the maintenance tasks of ETL processes. This pro- posal is based on UML (Uni¯ed Modeling Language) activity diagrams for modeling ETL processes, and on the FMESP (Framework for the Modeling and Evaluation of Software Processes) framework for the validation of the measures.In data warehousing, ETL (Extract, Transform, and Load) processes are in charge of extracting the data from data sources that will be contained in the data warehouse. Due to their relevance, the quality of these processes should be formally assessed from early stages of development, in order to avoid making bad decisions as a result of incorrect data. In this paper, a set of measures is presented to evalu- ate the structural complexity of ETL process models at conceptual level. Moreover, this study is accompanied by one controlled experiment whose aim is the empirical validation of the proposed measures. The use of these measures can aid designers to predict the e®ort associated with the maintenance tasks of ETL processes. This pro- posal is based on UML (Uni¯ed Modeling Language) activity diagrams for modeling ETL processes, and on the FMESP (Framework for the Modeling and Evaluation of Software Processes) framework for the validation of the measures

    MEETING INTERNACIONAL DE EDIMBURGO [11 - 13 marzo 2022]

    Get PDF
    ANALISIS DE LA COMPETICIÓN Informes Individuales [formato doble] MEETING INTERNACIONAL DE EDIMBURGO [11 - 13 marzo 2022]Real Federación Española de Natació

    Towards readable layouts for modeling data warehouses

    No full text
    Data warehouses are large-scale databases that are usually managed by means of diagram-based conceptual models. However, the complexity of those models often imposes significant design challenges. In particular, this article studies their different underlying graph layouts. The working hypothesis is that graph layouts influence diagram readability, with the latter being significant for facilitating the design process. We define the main viewpoints involved in conceptual modeling. For each one, surveyed as well as alternative layouts were evaluated against a set of aesthetics and efficiency measures. As a result, more readable graph layouts than those found in the literature were identified

    Towards a model-driven engineering approach of data mining

    Get PDF
    Nowadays, data mining is based on low-level specications of the employed techniques typically bounded to a specic analysis platform. Therefore, data mining lacks a modelling architecture that allows analysts to consider it as a truly software-engineering process. Here, we propose a model-driven approach based on (i) a conceptual modelling framework for data mining, and (ii) a set of model transformations to automatically generate both the data under analysis (via data-warehousing technology) and the analysis models for data mining (tailored to a specic platform). Thus, analysts can concentrate on the analysis problem via conceptual data-mining models instead of low-level programming tasks related to the underlying-platform technical details. These tasks are now entrusted to the model-transformations scaffolding.This work has been supported by the ESPIA (TIN2007-67078) project (Spanish Ministry of Education), and by the QUASIMODO (PAC08-0157-0668) project (Castilla-La Mancha Ministry of Education). Jesús Pardillo and Jose-Norberto Mazón are funded by the Spanish Ministry of Education (FPU grants AP2006-00332 and AP2005-1360)

    Specifying aggregation functions in multidimensional models with OCL

    Get PDF
    Multidimensional models are at the core of data warehouse systems, since they allow decision makers to early define the relevant information and queries that are required to satisfy their information needs. The use of aggregation functions is a cornerstone in the definition of these multidimensional queries. However, current proposals for multidimensional modeling lack the mechanisms to define aggregation functions at the conceptual level: multidimensional queries can only be defined once the rest of the system has already been implemented, which requires much effort and expertise. In this sense, the goal of this paper is to extend the Object Constraint Language (OCL) with a predefined set of aggregation functions. Our extension facilitates the definition of platform-independent queries as part of the specification of the conceptual multidimensional model of the data warehouse. These queries are automatically implemented with the rest of the data warehouse during the code-generation phase. The OCL extensions proposed in this paper have been validated by using the USE tool.Work supported by the projects: TIN2008-00444, ESPIA (TIN2007-67078) from the Spanish Ministry of Education and Science (MEC), QUASIMODO (PAC08-0157-0668) from the Castilla-La Mancha Ministry of Education and Science (Spain), and DEMETER (GVPRE/2008/063) from the Valencia Government (Spain). Jesús Pardillo is funded by MEC under FPU grant AP2006-00332

    Towards the conceptual specification of statistical functions with OCL

    Get PDF
    Current proposals for designing information systems lack the mechanisms to define statistical functions at the conceptual level. Therefore, queries containing these kind of functions are defined once the rest of the system has already been implemented, which requires much effort and expertise. In this sense, the goal of this paper is to show the benefits of extending the Object Constraint Language (OCL) with a predefined set of statistical functions.Work supported by the projects: TIN2008-00444, ESPIA (TIN2007-67078) from the Spanish Ministry of Education and Science (MEC), QUASIMODO (PAC08-0157-0668) from the Castilla-La Mancha Ministry of Education and Science (Spain), and DEMETER (GVPRE/2008/063) from the Valencia Government (Spain). Jose-Norberto Mazón and Jesús Pardillo are funded by MEC under FPU grants AP2005-1360 and AP2006-00332, respectively. Jordi Cabot is funded by the 2007 BP-A 00128 grant (Catalan Government)
    corecore